Search CORE

3 research outputs found

Synonyms and Antonyms: Embedded Conflict

Author: Samenko Igor
Tikhonov Alexey
Yamshchikov Ivan P.
Publication venue
Publication date: 27/04/2020
Field of study

Since modern word embeddings are motivated by a distributional hypothesis and are, therefore, based on local co-occurrences of words, it is only to be expected that synonyms and antonyms can have very similar embeddings. Contrary to this widespread assumption, this paper shows that modern embeddings contain information that distinguishes synonyms and antonyms despite small cosine similarities between corresponding vectors. This information is encoded in the geometry of the embeddings and could be extracted with a manifold learning procedure or {\em contrasting map}. Such a map is trained on a small labeled subset of the data and can produce new empeddings that explicitly highlight specific semantic attributes of the word. The new embeddings produced by the map are shown to improve the performance on downstream tasks

arXiv.org e-Print Archive

Fine-Tuning Transformers: Vocabulary Transfer

Author: Kozlovskii Borislav
Mosin Vladislav
Samenko Igor
Tikhonov Alexey
Yamshchikov Ivan P.
Publication venue
Publication date: 12/12/2022
Field of study

Transformers are responsible for the vast majority of recent advances in natural language processing. The majority of practical natural language processing applications of these models are typically enabled through transfer learning. This paper studies if corpus-specific tokenization used for fine-tuning improves the resulting performance of the model. Through a series of experiments, we demonstrate that such tokenization combined with the initialization and fine-tuning strategy for the vocabulary tokens speeds up the transfer and boosts the performance of the fine-tuned model. We call this aspect of transfer facilitation vocabulary transfer

arXiv.org e-Print Archive